Goto

Collaborating Authors

 quantitative analysis


Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

Slater, Tyler

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, and Llama 3 8B) by prompting them to implement a standardized Book Lending Microservice under strict Hexagonal Architecture constraints. Utilizing Abstract Syntax Tree (AST) parsing, we find that while proprietary models achieve high architectural conformance (0% violation rate for GPT-5.1), open-weights models exhibit critical divergence. Specifically, Llama 3 demonstrated an 80% Architectural Violation Rate, frequently bypassing interface adapters to create illegal circular dependencies between Domain and Infrastructure layers. Furthermore, we identified a phenomenon of "Implementation Laziness," where open-weights models generated 60% fewer Logical Lines of Code (LLOC) than their proprietary counterparts, effectively omitting complex business logic to satisfy token constraints. These findings suggest that without automated architectural linting, utilizing smaller open-weights models for system scaffolding accelerates the accumulation of structural technical debt.


approach to study GNN designs, the first quantitative analysis for GNN task similarity, and offers rigorous findings via 2

Neural Information Processing Systems

We thank the reviewers for their constructive feedback. We thank R2 and R3 for raising that our paper lacks theoretical analysis. LU activation significantly improves GNN performance. We will add these new discussions to the revised paper. We thank reviewers for suggesting other design dimensions to explore.



Digitizing Spermatogenesis Lineage at Nanoscale Resolution In Tissue-Level Electron Microscopy

Xiao, Li, Liu, Liqing, Wu, Hongjun, Zhong, Jiayi, Zhang, Yan, Hu, Junjie, Fei, Sun, Yang, Ge, Xu, Tao

arXiv.org Artificial Intelligence

School of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China # These authors contributed equally to this work. Email: andrewxiao@bupt.edu.cn;liuliqing@ibp.ac.cn;huj@ibp.ac.cn; feisun@ibp.ac.cn; yangge@ucas.edu.cn;xutao@ibp.ac.cn ABSTRACT Recent advances in 2D large - scale and 3D volume electron microscopy have stimulated the rapid development of nanoscale functional analysis at the tissue and organ levels. To meet the requirements of characterizing intracellular organelle s and their interactions within defined cellular cohorts at tissue level, we have developed DeepOrganelle. It adopts a lightweighted Mask2Former frameworks as a universal segmentor and is capable of segmenting and extracting organelles within different cell types, performing statistical quantitative analysis, as well as visualizing and quantifying the spatial distribution of organelle morphologies and interactions across different cell types at tissue scales. Using DeepOrganelle, we systemically perform cross - scale quantification of membrane contact sites( MCSs) dynamics across the progression of the seminiferous epithelial cycle, covering 12 distinct developmental stages and 24 statuses of germ cells . Noticeably, it discovers a waved pattern of mitochondria(Mito) - endoplasmic reticulum(ER) contact with a significant increase specifically at Stage X pachytene preceding the transition to diplotene, which aligns well with a newly reported experiment that mitochondrial metabolic proteins like PDHA2 are essential for this transition by maintaining ATP supply for double - strand break (DSB) repair.



approach to study GNN designs, the first quantitative analysis for GNN task similarity, and offers rigorous findings via 2

Neural Information Processing Systems

We thank the reviewers for their constructive feedback. We thank R2 and R3 for raising that our paper lacks theoretical analysis. LU activation significantly improves GNN performance. We will add these new discussions to the revised paper. We thank reviewers for suggesting other design dimensions to explore.


ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models

Zhong, Jing, Yin, Jun, Li, Peilin, Zeng, Pengyu, Zang, Miao, Luo, Ran, Lu, Shuai

arXiv.org Artificial Intelligence

Architectural cultures across regions are characterized by stylistic diversity, shaped by historical, social, and technological contexts in addition to geograph-ical conditions. Understanding architectural styles requires the ability to describe and analyze the stylistic features of different architects from various regions through visual observations of architectural imagery. However, traditional studies of architectural culture have largely relied on subjective expert interpretations and historical literature reviews, often suffering from regional biases and limited ex-planatory scope. To address these challenges, this study proposes three core contributions: (1) We construct a professional architectural style dataset named ArchDiffBench, which comprises 1,765 high-quality architectural images and their corresponding style annotations, collected from different regions and historical periods. (2) We propose ArchiLense, an analytical framework grounded in Vision-Language Models and constructed using the ArchDiffBench dataset. By integrating ad-vanced computer vision techniques, deep learning, and machine learning algo-rithms, ArchiLense enables automatic recognition, comparison, and precise classi-fication of architectural imagery, producing descriptive language outputs that ar-ticulate stylistic differences. (3) Extensive evaluations show that ArchiLense achieves strong performance in architectural style recognition, with a 92.4% con-sistency rate with expert annotations and 84.5% classification accuracy, effec-tively capturing stylistic distinctions across images. The proposed approach transcends the subjectivity inherent in traditional analyses and offers a more objective and accurate perspective for comparative studies of architectural culture.


Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks

Zakariyya, Idris, Ayaz, Ferheen, Kharbouche-Harrari, Mounia, Singer, Jeremy, Keoh, Sye Loong, Pau, Danilo, Cano, José

arXiv.org Artificial Intelligence

Reducing the memory footprint of Machine Learning (ML) models, especially Deep Neural Networks (DNNs), is imperative to facilitate their deployment on resource-constrained edge devices. However, a notable drawback of DNN models lies in their susceptibility to adversarial attacks, wherein minor input perturbations can deceive them. A primary challenge revolves around the development of accurate, resilient, and compact DNN models suitable for deployment on resource-constrained edge devices. This paper presents the outcomes of a compact DNN model that exhibits resilience against both black-box and white-box adversarial attacks. This work has achieved this resilience through training with the QKeras quantization-aware training framework. The study explores the potential of QKeras and an adversarial robustness technique, Jacobian Regularization (JR), to co-optimize the DNN architecture through per-layer JR methodology. As a result, this paper has devised a DNN model employing this co-optimization strategy based on Stochastic Ternary Quantization (STQ). Its performance was compared against existing DNN models in the face of various white-box and black-box attacks. The experimental findings revealed that, the proposed DNN model had small footprint and on average, it exhibited better performance than Quanos and DS-CNN MLCommons/TinyML (MLC/T) benchmarks when challenged with white-box and black-box attacks, respectively, on the CIFAR-10 image and Google Speech Commands audio datasets.


Microscopic Analysis on LLM players via Social Deduction Game

Kim, Byungjun, Seo, Dayeon, Kim, Bugeun

arXiv.org Artificial Intelligence

Recent studies have begun developing autonomous game players for social deduction games using large language models (LLMs). When building LLM players, fine-grained evaluations are crucial for addressing weaknesses in game-playing abilities. However, existing studies have often overlooked such assessments. Specifically, we point out two issues with the evaluation methods employed. First, game-playing abilities have typically been assessed through game-level outcomes rather than specific event-level skills; Second, error analyses have lacked structured methodologies. To address these issues, we propose an approach utilizing a variant of the SpyFall game, named SpyGame. We conducted an experiment with four LLMs, analyzing their gameplay behavior in SpyGame both quantitatively and qualitatively. For the quantitative analysis, we introduced eight metrics to resolve the first issue, revealing that these metrics are more effective than existing ones for evaluating the two critical skills: intent identification and camouflage. In the qualitative analysis, we performed thematic analysis to resolve the second issue. This analysis identifies four major categories that affect gameplay of LLMs. Additionally, we demonstrate how these categories complement and support the findings from the quantitative analysis.


Quantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Submissions using AI Detection Tool

Akram, Arslan

arXiv.org Artificial Intelligence

Many people are interested in ChatGPT since it has become a prominent AIGC model that provides high-quality responses in various contexts, such as software development and maintenance. Misuse of ChatGPT might cause significant issues, particularly in public safety and education, despite its immense potential. The majority of researchers choose to publish their work on Arxiv. The effectiveness and originality of future work depend on the ability to detect AI components in such contributions. To address this need, this study will analyze a method that can see purposely manufactured content that academic organizations use to post on Arxiv. For this study, a dataset was created using physics, mathematics, and computer science articles. Using the newly built dataset, the following step is to put originality.ai through its paces. The statistical analysis shows that Originality.ai is very accurate, with a rate of 98%.